DA-2P020401A728 

AMCMS  Code:  5910.21.63071 

HDL  ProJ:  01220 


AD 


TR-1392 
Part  IE 

AUTOMATION  OF  THE  ABC  SYSTEM 
Part  EL  Appendixes,  Charts,  and  Illustrations, 


•»/ 

Berthold  Altmann 
Walter  A.  Riessler 


August  1968 


H  •  0  •  I 


US  ARMY  MATERIEL  COMMAND. 

HARRY  DIAMOND  LABORATORIES 

WASHINGTON.  DC.  20436 


IHlS  DOCUMtNI  HAS  HUN  APPHOV(I)  (OR  PUBUC  NtlCASt 
AND  SAJI,  ils  DlSlKIBIlllON  IS  IINUMUID 


*»' 

0, 


s 


i 

i 

r 

i 


A, 


i 


.  :*y,  • 


CONTENTS 

No. 


abstract . 3 

U'KNOW!  J-’IJGMKNT .  1 

PART  I.  l.INni.I.STir  Pl(i)I»j:MS  and  outline  of  phototype  test 

\.  I  til  i*<  mI  nr  *  i  or. . V 

B.  l.i  ii;.ii i  s!  i >•« . . 8 

a.  The  Linguist  ic  Problem  of  I nfornuition  Relrievul.  8 

!i.  Co«ml in.ue- Index  Typo  Systems . 10 

c.  The  ABC  -  Method  .  .  . . 12 

d.  General  foils  I  dera  I  ions . 13 

e.  Definitions . 14 

1.  Terminology  Used  to  Deseri be  the  Natural 

language . 11 

2.  Doeumenl  vs  Information  Retrieval  .  16 

f.  Semantic  Theory . 16 

I, .  The  American  Psychological  Approacli.  . . 18 

h.  Syntax  and  Soman tics:  Tho  Tliinking-Psyehological 

App  roach . 19 

i.  The  Thinking  Machine . 21 

J.  Discussion . 22 

k.'  Focal izut ion  upon  Specific  Requirements . 21 

C.  The  Prototype  Automated  ABC  Retrieval  System  ....  24 

a.  Test  Collection . 25 

b.  Processing  for  Automation . 25 

c.  The  Measuring  Tool  and  Its  Applications . 26 

d.  Statistical  Problems . 29 

e.  Elements  Tested . 30 

f.  Impact  of  the  Numlier  of  Categories  in  a  Given 

System . 30 

g.  On-Line  Retrieval . 32 

PART  IT.  PROTOTYPE  TEST  (DESIGN  AND  ANALYSIS)  AND  PROCESSING  FOR 

SECOND-GENERATI ON  MODEL 

D.  Model  and  Statistical  Analysis  of  Automatic  Prototype 

'Pest . 33 

a.  Evaluation  of  the  Results.  . . 33 

b.  The  Recall-Relevance  Relationship . 36 

c.  The  Standard  Deviation . 40 

d.  Decision  about  Relevance  Formula  .  41 

e.  Smoothing  Procedures . 43  >■ 

f.  Reduced  Evaluation  Scales . .45 

g.  Effect  of  the  Number  of  Categories . 46 

li.  Discrimination  Power  of  a  Category . 47 

i .  Experiments  for  Improving  D. . 48 

E.  From  Test  Model  to  the  Comprehensive  Test  and  to  the 

Second-Generation  ABC  System  .  49 

a.  The  Categories  and  Their  Applications . 50 

b.  The  Worksheet  Approach  . . 51 

c.  The  ABC  Retrieval  Methods . 53 


ill 


F.  Conclusions  and  Project  ions . 55 

POSTSCRIPT . 00 

REFERENCES . 01 

PART  III.  APPENDICES.  CHARTS.  AND  I  LI.USTHAT1  QNS 

Appendix  I  ---Computer  Files  and  Programs . 09 

Appendix  II —  The  Second-General  ton  ABC  Dictionary . 75 

Appendix  III---  Tlie  Mechanical  Standardization  Process  for  ABC 

Desr ri  pi  nr.s . 75 

Appendix  IV  —  Derivation  of  the  Distribution  Formula . 77 

Appendix  V— ■  Participants  in  the  Construction  of  the  Test 

Collection.  ,  .  79 

Chart  A  Derivation  of  Test.  Collection  and  Queries . 80 

Chart  D  Categories  Used  for  Automated  Test. . 81 

Chart  C  Dependence  of  D  (Deficiency)  on  Progressive  Acceptance 

of  Decreasingly  Relevant  Documents  as  Relevant  Ones.  .  .82 

Chart  D  Preliminary  Worksheet  . . 83 

Churt  E  Worksheet  for  Structured  Abstracts . 84 

Chart  F  Thesaurus  Automatically  Derived  from  Input . 85 

Chart  G  Flowchart  for  Automatic  Standardization  of  Syntagmas 

(ABC  Descriptors) . 86 

Chart.  II  Filter  Codes . 87 

Chari  1  Selective  Dissemination  Worksheets . 88 

Chart  J  Sample  Page  of  Second-Generation  ABC  Dictionary . 89 

FIGURES 

1.  Two-dimensional  presentation  of  ranked  order  output  .  90 

2.  Normalised  ranked-order  output  as  presented  in  Figure  1  ...  90 

3.  Relevance- recall  curves  derived  from  formula  (1) .  90 

•I.  Recall-relevance  curves  os  function  of  OK  and  irK . 91 

5.  K  as  function  of  D . . . .  92 

6.  Integral  distribution  of  the  D^'s  obtained  in  3  test  runs  .  .  93 

7.  Normalized  standard  deviation  of  D  plotted  vs  the 

corresponding  D  of  several  test  runs . .  .  .  94 

8.  D|'s  of  the  50  queries  vs  numbers  of  responsive  documents 

for  one  test  run  ...  95 


9.  Frequency  of  evaluation  numlmrs  used  by  different  evaluators 

(sample)  ....  96 


10.  First  smoothing  method  applied  to  the  vectors  of  one 

evaluator.  .  .  .  .  .97 

11.  Effect  of  reducing  the  10-valued  to  a  2-valued  scale  upon 

the  deficiency . 98 

12.  Effect  of  number  of  evaluation  grades  upon  D . 99 

13.  D  vs  numl>er  of  applied  categories . 100 

14.  The  effectiveness  of  the  individual  categories  vs  their 

frequency  within  the  document  vectors  . 101 


15.  D  (change  of  D  by  dropping  category  c)  vs}  Wc  (efficiency  of 

*  category  c) . 102 

16.  Changes  of  D  when. the  retrieval  formula  is  modified  by 

weight  factors  W0.  .  .  .103 

FORM  1173,  Last  Page,  Part  III 


js&vfifLii&itixf*., . 


*>«» ■****•••  ■•■v.,1  ,-. .. 


-  -MfeKWr*  v  -—“Tr-WC^Sr-^r^r “«?•*/ 


'’VWSJSrWW^'F.'  v-<\««£.  •'«*-  <; 


rat'Qf'MSMMBIIMi 


APPENDIX  I 

oomputkk  rri.Ks  and  programs 

It  is  obvious  lii.it  t Ik-  ilium: rims  mechanical  retrieval  runs  performed 
during  the  tost  of  the  prototype  automated  system  required  a  large  amount 
ol  program'll i ug  effort.  Because  llib.  effort,  valuahle  as  it  is,  will 
not  contribute  to  the  day  to  day  operation  ol  the  second-generation 
system,  it  has  been  omitted  from -the  following  list  which  contains  only 
such  computer  files  and  programs  as  will  assure  the  effectiveness  and 
efficiency  of  the  ARC  storage  and  retrieval  method.  The  different  pro¬ 
grams  written  in  COBOL  are  as  elements  of  one  integrated  system  ready 
to  tie  compiled  and  to  perform  all  required  operations  and  produce  all 
required  products  in  one  continuous  process67. 

The  current  computer  programs  completed  or  nearing  completion  produce 
und  maintain  ttie  following  magnet  ie  tape  files: 

1.  the  ABC  descriptors  in  their  unrotated  form  by  alphabetical 
accession  and  identification  code 

2.  the  ABC  descriptors  alphabetically  sequenced  by  rotated  key¬ 
words  as  a  result  of  KWIC-program  processing 

3.  the  dated  ABC  descriptors  withdrawn  from  the  active  file  by 
subject  terminology  °r  subject-date  combination  and  in  unrotated  form 
as  File  No.  1 

4.  the  ABC  descriptors  related  to  selected  subject  categories 
in  the  format  of  File  No.  2 

5.  the  complete  bibliographic  information  (subdivisions  by  fields, 
personal  author,  corporate  author,  title,  contract  or  project  number, 
year,  security  classification,  each  one  directly  accessible)  for  the 
active  library  holdings  ;  its  primary  organization  by  type  of  publication: 
reports,  books,  periodicals  and  periodical  articles,  etc.;  and  its 
secondary  organization  by  shelf  or  accession  numbers 

6.  the  inventory  of  ail  bound  book  und  periodical  volumes  by 
shelf  number 

7.  the  title  information  withdraw'!!  from  the  active  File  No.  5 

8.  authority  list  of  corporate  author  names  and  name  variations 
organized  by  identifying  code 

9.  index  to  periodical  titles  organized  by  the  procurement  source 
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i<*.  I'iiloU"!*)  names  it r r'an^'cl  h\*  «"i  I  egory  coses 

11.  asterisk  It* rms  ami  descriptor  AIM.'  L'niicb  -  an  index  referring 
In  lilt*  I'u  1 1  -  longl  !i  AIM’  imllvidn.il  descri  ptors  as  well  as  tu  Hit*  csilsilog 
entries  of  correspond i ng  lilies 

12.  SDI  printouts:  user  profiles  accept:  (a)  a  comhinal ion  of 
OA«  specified  category  and  of  (b)  keywords  in  one  particular  category 
dictionary 

Hie  programs  lisieil  above  generate  the  following  printouts: 

1.  lists  of  unrotated  ABC  descriptors  arranged  by  codes  and 
indirectly  by  date 

2.  alphabetical  index  to  terms  contained  in  hyphenated  phrases 

3.  ABC  descriptor  dictionaries  in  rotated  format  in  alphabetical 
sequence  by  keyword,  for  entire  collection,  for  categories,  and  for 
retired  materials 

4.  access ions-bu I let  ins  of  two-column  printout  with  category 
headings  inserted  from  File  10  above 

5.  complete  sets  of  cutalog  cards  ready  for  filing  into  personal 
author,  corporate  author,  title,  subject,  parameter,  ABC-subject ,  con¬ 
tract  or  project  number.  AD-number,  etc.,  catalogs  and  shelf  lists 

6.  rotated  tittle  list  (one-llne  KWIC  program) 

7.  cumulative  document  title  cntnlog  in  book  form  by  aecosssion 
or  bulletin  number 

8.  cumulative  rotated  title  listing  for  books,  reports,  etc. 
in  KWIC  format 

9.  lists  of  periodical  holdings 

10.  periodical  title  lists  by  renewal  date  and  vendors’  names 

11.  punched  cards  for  all  issues  of  subscribed  periodicals 

12.  lists  of  titles  withdrawn  from  current  files 

13.  catalogs  for  historical  files 
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L'l.  ciiliiluf's  of  ill!  lilies  requiring  (luwni'radini;  <>l  security  clussi- 
fical ion 

15.  eorporutc  anlhor  anlhorily  lists  in  ;i  \  5  eard  format 

l(>.  for|mr;i | .nil Inn  cross  re  I eivnreK  on  .‘1  x  5  cards  to  hr  filed 
in  catalog 

17.  |H*  r  i  o,|  u-.i  t  vendor  lists  by  vendor  code  i»s  well  as  by  vendor 

name 

18.  lexis  of  subject  categories  used  as  headings  in  accession  bulle¬ 
tins  and  as  lilies  of  category  ABC  dictionaries 

19.  lit  It?  call  logs  arranged  by  AB(J  descriptor  code 

20.  title  catalogs  arranged  by  shelf  number 

21.  lists  of  periodical  issues  not  received  (to  be  mailed  to 
vendor) 

22.  frequency  countsof  documents  broken  down  by  ABC  code  within 
each  category 

23.  lists  of  keywords  by  categories  with  frequency  counts  of  related 
documents  and  descriptors  (documents  as  well  as  descriptors  are  identified) 

The  category  term  dictionaries  and  their  frequency  counts  will  be 
used  (a)  to  assist  the  professional  man  in  entering  the  system  from  a 
terminal  provided  with  teletype  or  electronic  display  reception  equip¬ 
ment;  (b)  to  develop  vector  numbers  by  a  mechanical  process  that  is  to 
replace  the  psychometric  evaluat ions  and  to  orgunize  the  content  of 
the  collection  under  the  assumption  thut  this  preparation  and  integration 
of  information  establishes  multiple  links  to  subjects  of  interest  and 
prepares  a  spatial  distribution  or  classification  for  effective  mechanical 
retrieval;  (c)  to  produce  continuously  chunging  measures  of  relatedness 
as  well  as  of  significance,  to  automate  step  by  step  the  development  of 
and  the  assignment  to  categories;  and  (d)  to  standardize  the  descriptor 
phrases,  their  linguistic  components  and  their  syntax  by  performing  as 
a  computerized  effort  the  editorial  changes  of  the  analytical  statements 
(ABC  descriptors)  through  comparison  with  the  information  stored  in 
a  variety  of  formats,  by  categories,  types  of  associations  and  terms 
and  phrases  with  frequency  counts  so  that  human  effort  ultimately 
will  be  required  only  to  supplement  what  Is  missing  in  current  listings 
or  stored  tables. 
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APPKNDIX  i I 

SECONP-OK.NKKAT iO.\  Alii:  Dili  10NAHY 

i»l  I  tur  numerous  toiiipul  or  programs  Ilia!  have  been  developed  the 
one  !<•  generate  tin-  second-generation  AIK'  dictionary  deserves  particular 
at  ten  I  mu, 

The  program  is  operational1''’.  A  sample  of  the  dictionary  is 
shown  as  chart  J  The  following  characteristics  of  the  dictionary  are 
worth  noting: 

1.  The  legibility  is  improved  by  eliminating  the  throw-hack. 
Instead,  overt  low  is  printed  on  the  subsequent  line  rather  than  on  the 
same  line.  Markers  clearly  indicate  the  start  of  each  descriptor. 

2.  Tlie  text  of  the  descriptor  following  the  keyword  in  the  index 
window  is  alphabetized  for  30  digits  to  organize  the  AIK:  descriptors 
within  the  keyword  cluster  by  use  of  standardized  connectors  such  as 
prepositions,  participles,  and  infinitives. 

3.  Tlie  maximum  length  of  the  descriptors  is  increased  to  450 
characters  lo  improve  tlie  descriptions  and  to  forego  the  hyphenated 
phrases  that  were  necessary  for  brevity  with  the  first  generation 
format . 

« 

4.  When  any  term  fails  to  make  a  substantial  contribution  to  the 
meaning  of  a  given  descriptor  thut  descriptor  is  not  assigned  to  the 
cluster  for  the  deficient  term.  This  restriction  depending  on  the 
decision  of  the  analysts  is  essential  in  confining  approaches  lo  useful 
information  and  is  utilized  in  addition  lo  the  stop  list  of  permanently 
blocked  words. 

5.  Of  the  other  terms  that  are  also  eliminated  to  streamline  and 
compact  the  descriptors  we  list:  (a)  the  parameters  that  form  a  separate 
title  index  (by  alphabetized  names  of  materials,  components,  devices, 
otc.,  subdivided  by  parameters  and  sub-subdivided  by  the  numerical  values 
in  ascending  order);  and  (b)  the  filter  codes  (  (hart H )  denoting  the 
types  of  publications,  and  which  are  used  to  organize  the  title  index 

of  the  ABC  subject  dictionary. 

6.  The  size  of  tlie  dictionary  is  further  reduced  by  the  use  of 
hyphenated  phrases.  Nevertheless,  access  to  the  not-alphabetized 
(and  therefore  not-clustcrforming)componenl  words  within  these  phrases 
can  be  provided  through  a  mechanically  produced  Index  of  cross-references 
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7.  About  '10  to  50  ABC  dictionaries  in  different  slightly  over¬ 
lapping  categories  will  be  issued.  This  is  a  by-product  of  the 

vector  analysis  performed  (pagesso to  51  )  for  automated  retrieval  opera¬ 
tions.  (The  analysts  relate  each  document  to  10  to  15  of  about  190 
categories.  Compilation  and  printing  of  the  ABC  discriptor  dictionaries 
and  the  ABC  term  dictionaries  for  the  entire  collection  and  important 
categories,  the  accession  bulletins,  current  awareness  listings,  various 
forms  of  card  and  book  catalogs  are  automated.  The  retrieval  operations, 
however,  can  lie  performed  manually-.* 


*  This  semi-automatic  ftethod  might  be  preferable  for  smaller  organi¬ 
zations  without  computers  of  their  own,  especially  if  they  have  some 
access  to  a  computer  center. 
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APPENDIX  i 1 1 

THE  MECHANICAL  STANDARDIZATION  PROCESS  EOR  ADC  DESCRIPTORS 

To  illustrate  I  lie  iisaehnn  ical  si  Simla rdi zu t  ion  process  uv  cniisidvr 
the  t'*»l  lowing  one-sentence  abstract  of  a  paper.  "A  parallel  spiral 
antenna  system  with  immunity  to  nuclear-blast  gamma  radiation  was 
designed  for  a  chirp-radar  fuze  in  a  Nike-X  antimissle  missile."  In 
the  process  of  transforming  this  sentence  into  an  ABC  descriptor,  the 
verb  phrase  "was  designed"  is  replaced  by  the  noun  "design,"  so  that 
without  change  of  meaning  the  ABC  descriptor  will  read:  "Design  of  a 
parallel  spiral  antenna  system  with  immunity,  etc." 

To  standardize  this  or  any  other  ABC  descriptor  and  to  produce 
a  structured  description,  two  factors  ure  of  pre-eminent  importance, 
the  logical  sequence  of  the  component  ideas  and  their  verbal  expressions, 
and  the  consistency  of  connectors  (prepositions,  participles,  infinitives, 
etc.)  which  combine  the  component  ideas  with  each  oilier  and  produce 
the  conceptual  entity  we  call  an  ABC  structured  abstract. 

The  HDL  method  is  implemented  by  providing  a  questionnaire  for  the 
research  analyst .  The  questions  must  be  broad  enough  to  cover  every 
subject  detail  encountered  and  they  must  he  arranged  in  a  sequence  in 
which  concepts  are  strung  together  to  form  logical  unambiguous  ABC 
descriptions.  Moveover,  each  individual  question  must  relate  to  one 
standardized  connector  in  order  that  the  computer  may  produce  structured 
(ABC)  descriptions  from  the  input  of  the  encoded  answers  which  the 
analysts  have  recorded  on  the  questionnaire.  In  filling  out  the 
questionnaire  the  analyst  will  consider  the  questions  merely  as  constant 
reminders  to  cover  all  essential  aspects  that  a  given  document  may  con¬ 
tain;  but  he  will  remain  completely  unaware  that  the  questions  arranged 
by  letters  or  numbers  will  encode  his  answers,  and  that  the  code  will 
not  only  determine  the  order,  but  also  the  connector  the  computer  will 
use  in  compiling,  completing  ami  printing  the  English  language  descrip¬ 
tion.  The  analyst  will  also  not  be  concerned  with  such  details  as 
to  whether  some  questions  must  precede,  and  others  follow,  the  main 
subject  of  the  description  in  order  to  generate  the  standardized  sequence 
of  the  component  elements. 

For  a  very  general  understanding  of  the  properties  and  the  potential 
of  structured-abstract  approach  we  present  on  chart  D  the  still  experi¬ 
mental  questionnaire  currently  used  by  our  analysts. 

Furthermore,  tlie  manual  will  contain  references  that  lead  the  analyst 
to  the  particular  category- term  dictionary  In  which  pert inent  standardized 
terminology  has  been  organized  by  computer. 
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■  Thu  iicUial  quest  tonnaire  furiii  which  the  research  analyst  will 
interpret  is  a  matrix  with  a  number  of  columns,  to  assure  unambiguous 
answers.  On  chart  E  wo  illustrate  the  worksheet  completed  by  an  analyst 
who  processed  the  subject  content  of  the  paper  listed  above.  The  first 
column  is  used  to  record  all  the  answers  relating  directly  to  the 
main  subject:  "antenna";  the  second  column  (2a)  to  record  the  answers 
related  to  (or  modifying)  the  term  "fuze";  and  the  third  column  (2b) 
provides  an  unswer  that  modifies  the  term  "radiation";  and  the  last 
column  (3)  the  word  "missile." 

The  governing  subject  of  the  main  set  as  well  as  of  the  subsets 
is  indicated  by  a  subsequent  colon.  All  terms  and  phrases  in  each  set 
or  subset  (divided  by  hyphens)  are  directly  and  exclusively  related  to 
the  respective  governing  subject. 

When  placed  into  sequence  and  combined  by  standardized  connector 
terminology  llio  computer  will  produce  an  ABC  structured  abstract  that 
reads:  "Design  stuc'y  of  parallel  spiral  antenna:  in  fuze,  resistant 
to  gamma  radiation  fuze:  in  anti-missile,  using  Chirp  radar  radiation: 
produced  by  nuclear  blast  missile:  Nike-X."* 

To  eliminate  any  misunderstanding  we  repeat  that  in  this  example 
both  phrases:  "in  anti-missile"  and  "using  Chirp  radar"  are  modifiers 
of  the  term  "fuze." 


*  The  described  process  has  been  computerized  and  tested. 
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TEXT  NOT  REPRODUCIBLE 

APPENDIX  IV 

DERIVATION  OK  THE  DISTRIBUTION  FORMULA 

T<>  aiqunv  ;i  r..ni  »  :•  function  tint  gives  the  disl  rib.d  ion  of  lh«» 
re  It".  :iii  I  document s  of  set  11  aim  mi'  (la-  i*  n  -re  lev  an  I  stl  A.  we  divided 
tin  procedure  into  in  t  ;  steps  in  tlu  I o|  lowing  way.  We  ranked  liist 
*i  a  i  tli  in  the  suliset  A.  of  A.  Second.  we  repeated  this  process  witli 
Aj  ir.  Mibsct  A  ol  A  and  so  «n.  Finally  we  ranked  within  A  and 
completed  a  ranking  of  B  in  A.  For  each  step  we  used  the  same  ranking 
function  i(x).  The  local  ion  ol  a  document  originally  located  at  X 
will  lie  utter  t tie  lirst  ranking  procedure : 

X.  =  f(X)  --  f.  (x) 

After  the  i1*1  ranking  its  location  will  he  at 
xri  =  ffxtM)  -  fi<x) 

fn+,  can  be  derived  from  In  through  the  operation 
fn41(x)  =  f„(f(x)) 

We  assume  that  f(x)  deviated  from  f(x)  -  x  only  slightly  and  therefore 
write 

I  1 

f(x)  -  x  -  .-g(x) 

where 

g(0)  =  g(l)  =  0  (1) 

For  a  sufficiently  small  t  we  cun  write: 

f„+1  (x)  =  f„(x  -  *g(x))  =  fn(x)  -  *g(x) 

or 

*«+i  "  ^n^x'^  =  1  -£(x)  (2) 

If  n  is  considered  a  high  number  and  *  a  small  one  so  that  cn  =  y  *s 
finite  wo  can  transform  the  left  side  of  equation  (2)  into  the  differen¬ 
tial  ratio 

d 

o>- 
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For  g(x)  wo  suhst  i  Into  flu*  Imict  i<»t  x(]  -  x)  (which  may  hi*  (hi*  simplest 
way  to  fulfill  the  condition  of  equal  ion  (1))  and  arrive  at 

r  -  F<-  +  lOg:--  )  I 

l-x 

The  still  unknown  function  F  can  he  calculated  hecausc  whenever  -  0 
then  F(x)  =  x.  The  result  is 


F(ii' 

ami  therefore 


1 

l+e“i- 


K''ix)  -  y  -- 


1  +  exp(y  -  log  - — ) 
l-x 


an  equivalent  of 


(i  -  Det3  *  -  l)«s  (3) 

x  y 

Formula  (3)  snows  a  symmetry  in  x  and  y.  the  value  of  v  can  bo  replaced 
by  formula  (3)  on  page  35. 


The  problem  of  the  distribution  of  the  relevant  items  among  the 
non-relevant  ones  in  a  given  collection  can  also  be  treated  as  a  dif¬ 
fusion  problem  where  only  the  product  t  of  both  the  diffusion  coefficient 
and  the  diffusion  time  is  the  important  element.  The  distribution  func¬ 
tions  arc  more  complex,  but  resemble  those  developed  above.  We  have 
preferred  our  method  because  of  its  mathematical  simplicity.  For 
high  vs,  corresponding  to  small  Ds  we  have  the  relationship  of  our 
D  and  t  approximated  by  the  formula: 

2  — 

D  -  wnere  v  3.1  116  .  .  . 


APPENDIX  V 

PARTICIPANTS  IN  THE  CONSTRUCTION  OF  THE  TEST  COLLECT ION 

Baba.  A.  J.  (BS) 

,  Boykin,  C.  (US) 

Colton ,  M.  M.  (PhD) 

Colbert,  R.  (BS) 

D'Angonu.  D.  S.  (BS) 

FriedberR,  I.  S.  (MS) 

Gibson.  H.  F.  (MS) 

Goldfarb,  It.  (BS) 

Mine,  M.  K.  (HA) 

Isler,  W.  E.  (MS) 

Jautz,  R.  (BS) 

Manion,  F.  M.  (MS) 

Marsh.  I).  S.  (BS) 

Mary,  D.  J.  (BS) 

McCall,  T.  D.  (BS) 

Mesrobian,  A.  (BS)  i  • 

Meylcr.  O.  L.  (BS) 

Miller,  J.  (BS) 

North,  G.  D.  (BS) 

Patterson.  M.  S.  (BS) 

Redcay,  P.  W.  (none) 

Riesslcr,  W.  A.  (PhD) 

Sommer,  H.  (PhD) 

Soper,  W.  L.  (MS) 

Tucker,  R.  W.  (BS) 

Tuttle.  .J.  E.  B.  (US) 

Wall,  R.  E. »  ARC  (PhD) 

Watkins.  S.  (BS) 

Williams.  W.  K.  (BS) 


PhD 

4 

MA  or  MS 

6 

BS 

18 

none 

1 

29 

Chart  A.  Derivation  of  test  collection  and  queries. 


i'hai‘1  It.  (  aii’uorics  lor  an i  nnin I  eel  lost 


1)  characteristics,  parameters,  data 


2)  theory,  analysis  . 


3)  Design,  development 

4)  electric,  electromagnetic 


5)  magnetic,  ferrite 


6)  acoustic,  hypersonic,  ultrasonic 

7)  optical,  light 


8)  photoelectric 


9)  magnetomechanical,  pie20 


10)  thermal,  temperature 


11)  tunneling  (tunnel  effect) 


1?)  junction  (space  charge) 


13)  field  effect 


14)  dielectric,  ferroelectric 


13)  parametric  (varactors,... 


31)  oscillation 


32)  modulation,  demodulation 


33)  transmission,  communication 


34)  detection 


33)  discrimination 


36)  synchronization,  tuning,  phase 


37)  stabilization 


38)  automatic  control 


39)  simulation,  analog 


40)  switching,  logic 


41)  memory,  storage 


42)  computer 


43)  radar,  sonar 


44)  maser,  laser 


43)  space  science 


16)  linear  devices  and  effects,  (resistors, 
capacitors,...) 

46)  military,  weapons 

17)  diode,  rectifier 

47)  manufacturing 

18)  transistor 


19)  non-linear  devices  and  effects  exc. 
nos.  17  and  18 


20)  functional  units 


21)  circuit 


22)  instruments,  equipment,  appliances 
(ready  for  use) 


23)  systems 


24)  noise 


25)  interference,  interaction 


26)  reliability,  vulnerability,  aging, 


27)  isolation,  shielding 


28)  generation  of  any  kind 


29)  conversion  of  any  kind  (energy, 

. . . . ) transducer 


30)  amplification,  gain 


48)  miniaturization 


49)  thin-film 


50)  measuring,  testing,  observation 


51)  improvement 


52)  germanium,  silicon,  selenium 


53)  compound  semiconductors 


54)  non-semiconducting  material 


55)  power,  energy 


57)  frequency 


58)  radio  waves 


59)  microwaves,  millimeter-waves 


vzxzmum 


Churl  C.  Dependence  of  I)  (deficiency)  on  progressive  acceptance 
of  decreun i ugly  relevant  documents  as  relevant  ones. 


NUMBER  OF  CLASSES 
INCLUDED 

ri*si 

D  in  % 

ERROR  INTERVAL  % 

1 

1196 

1.5 

-2.0 

1  through  2 

12532 

1.7 

-  0.7 

1 through  3 

31628 

3.2 

*0.8 

1  through  4 

44641 

3.5 

-0.8 

1  through  5 

54423 

4.3 

-0.9 

1  through  6 

70446 

3.8 

-0.7 

1  through  7 

85141 

3.9 

-  0. 6 

1  through  8 

92906 

4.4 

-  0.  7 

1  through  9 

96611 

4.7 

*0.7 

s:> 
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Code  Questions 


What  is  the  main  subject? 

A 

Form  of  Publication?  Work  phase?  Type  of  effort? 

A  of 

B 

Which  are  the  properties,  characteristics? 

adjective  phrase 

C 

What  is  the  shape,  form? 

adjecti  ve  phrase 

D 

What  is  the  physical  phase? 

adjective  phrase 

r~ 

How  is  it  produced,  caused?  (Tool,  method) 

produced  by  E 

F 

What  is  it  influenced  by  or  changed  by? 

influenced  by  F 

G 

What  is  it.related  to? 

related  to  G 

H 

What  is  it  a  component  part  of? 

being  part  of  H 

1 

What  is  it  limited  to? 

limited  to  1 

la 

What  is  excluded? 

without  la 

J 

What  is  it  designated  by? 

designated  J 

K 

What  is  it  simulated  by? 

simulated  by  K 

L 

What  is  it  modelled  by? 

modelled  by  L 

M 

What  materials  does  it  consist  of? 
or  what  materials  is  it  related  to? 

of  M 

N 

What  components  does  it  consist  of? 

of  N 

0 

What  devices  does  it  consist  of? 

of  0 

P 

What  instruments  does  it  consist  of? 

of  P 

Q 

What  is  its  purpose?  What  is  it  made  for? 

for  Q 

R 

What  is  it  resistant  to? 

resistant  to  R 

S 

What  is  it  vulnerable  to? 

vulnerable  to  S 

T 

What  does  it  cause  or  effect? 

resulting  in  T 

U 

On  what  does  it  have  influence 

of  U 

V 

What  is  it  performing  or  operating? 

participle 

w 

What  principle,  energy,  or  instrument  is  applied? 

using  W 

X 

What  is  the  reason  or  cause? 

because  of  X 

Y 

What  is  it  similar  to? 

like  Y 

Z 

What  is  its  environment? 

in  or  at  Z 

AA 

What  is  the  location  in  space?  (Table) 

"in1'  ‘'at*'  AA 

AB 

What  is  the  location  in  time?  <  Table  ) 

"during"  AB 

GRAPHIC  NOT  REPRODUCIBLE 
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Chari  K.  Wuiksliot'l  lor  si  rni'i  urc<l  nbsirarts 


DOCUMENT  COOC  I  _1_ 
SHELF  NUMBER'  J-JliLL 


I  Uu:LIli,l 


COLUMN  2 
REFERS  TO:  IF 


COLUMN  3 
REFERS  TO'.  10 


COLUMN  4 
REFERS  TO:  2F 


i  ninw>W3iiii.iJ  W  i 


j  HnlllEUMl 


•K*i!n  *;ubj«ct 


THE  FOLLOWING  CARDS  ARE  PUNCHED  FROM  THE  WORK  SHEET. 


-.T-MfiT-' 


tctih’  -"S'Tcrjffnr 


r.rfi  rnrrmT'Trr.Trm>^-T5!5,i!'  * 


ET-TV'iT’1'  ’  Tp 


i^n»rr^v'.TT~7rTirrrCT  tty *st.c*wx 


•  MM  MMM  M  MM  I  I  MtMIM  •  HIM  I  M  MM  •  •  •  •  MMIMMM  M 

. . 

,iiiiiM„iii„iuiimiimi  i  i  minim  m  him  ii  mm  i  u  ••  M  m 
ttmiiiiiiiiiiii.nl>  iiiiiiMtiiimn  ii»imiiimiimmiiiiitim»iii»i 
iiiiiimiimimiiiii  in  uiu  i  jiiiii  iiii  iui  mm  mmmim  >» 
m«.immm.mm.mmmm..mmmimmmmmmm  ihmiimiii  «mmm  mm  imii 
llllllllllllllll.nl.  MIMMMMMM  lllllllllllll  »  MIIIM  lllllllllllllllll 
M.MMMMMMMMMMMMMMMIMMIMIIMMIIMMMM  •  MMMMMIIIMMM 

i.iitimmimm. mmmi  nmimi  immmmmmm  nmimmi 

I  MMMMMMMMMMMMM—  . . MMM 

j  MMM»IMMIIMMiMM»ttM»Mt_M»MMM__»MMMMtMMMIJiM»MiMM.M.;i^» 


I  liarl  /■', 


HifSintni.^  ati  i  uiiiii  (  i  c;i  I  J  y  (li-r  i  v«-(l  I'rnm  input. 


(short  sample.-) 


A 

Terms 

Modified  by 

Associative 

ABC 

Code 

Code 

Antenna 

Design 

A 

CFA 

Fuze 

H 

CFA 

Gamma  Radiation 

S 

CFA 

Parallel 

C 

CFA 

Spiral 

C 

CFA 

fuze 

Anti-Missile  Missile 

H 

CFA 

Chirp  Radar 

W 

CFA 

Gamma  Radiation 

Nuclear  Blast 

E 

CFA 

Missile 

Nike-X 

J 

CFA 

6 

Terms 

Used  as  modifiers  of 

Anti-Missile  Missile 

Fuze 

H 

CFA 

Chirp  Radar 

Fuze 

W 

CFA 

Design 

Antenna 

A 

CFA 

Fuze 

Antenna 

H 

CFA 

Gamma  Radiation 

Antenna 

S 

CFA 

Nike-X 

Missile 

J 

CFA 

Nuclear  Blast 

Gamma  Radiation 

E 

CFA 

Parallel 

Antenna 

C 

CFA 

Spiral 

Antenna 

C 

CFA 

*  liat'l  (I.  Firwi  !af  lor  :iii(omati«'  standard.  ..  .on  •  >  f 
syntagmas  (Ait* '  pi  ors ) . 


I.  OR IGINAt  ABSTRACT 

Crystals  produced  by  the  Czochalsky  process  frequently  show  twin  boundaries.  An  experiment 
was  conducted  by  which  twins  in  Unthanium  -  Aluminum  -  Oxide  were  removed  by  the  appli¬ 
cation  of  modest  stress  at  room  temperature.  The  paper  describes  the  detwinning  method. 


A.  Tree  Presentation  "  CONTENT  ANALYSIS 


B.  format  ef  Computer -Stored  Analysis  (taken  from  worksheet) 


IA 

Description 

3C  Single 

IB 

Mechanical 

76  Modest 

If 

Detwinning 

7/IE*  Stress 

3/lAA*  Crystal 

IE 

Stress 

71  Room  Temperature 

3E  Czochalsky 

3M  U  -  AT  -  Oxide 

1AA 

Crystal 

III.  COMPUTER  PROCESSING  METHOD  (transformation  of  II. B.  I 

Step  1:  Introduction  of  Standard  Connectors  and  Preliminary  Arrangement 
Description  3  Mechanical  Detwinning  produced  by 


(IE)  Stress 

X  \ 


OB/1EI  Modest  QIJ  IE) 


Room-Temperature 


in 


0C/1AAI  Single' 


Crystal 

/\ 

(3E/IAA  Produced  OM/1AAI  Unthanium 
by  Czochalsky  Process  At  -  Oxide 


Step  7:  Intermediary  Process:  Modifiers  and  their  standard  connectors  are  properly  sequenced,  those 
coded  A-E  in  front  of.  all  others  behind  the  term  that  Is  modified. 


Step  3.-  Standardised  Computer  Product  "Description  of  mechanical  detwinning  produced  by  modest  stress 

at  room  temperature*,  in  single  crystal  produced  by  Czochalsky 
Process  of  Unthanium  -  Aluminum  -  Oxide". 


*  TIm  comma  indicates  that  the  subsequent  phrase  is  directly  related  to  the  main  subject 
fdetwlnning").  but  its  component  elements  refer  to  the  main  subject  of  this  particular  phase  ("crystal"). 


riiait  II. 


i-'i  I !  i  r  . 


State-of-the-art  surveys 
Bibliography,  abstracts 
Collections,  proceedings 

Design,  development  and  engineering  studies  and  reports 
Elementary,  popular,  introductory  studies 
Feasibility  studies 
Graphs  and  tables 
Historical  studies 

Dictionaries,  lexicons 
Computer  programs  and  simulation 

r 

Mathematical  and  statistical  studies 

Production  engineering 

Research:  applied  and  theoretical 
Standards,  specifications 
Tests:  laboratory 
Tests:  field 

Test  equipment  and  procedures 
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Chart  J.  Sample  page  of  second-generation  ABC  dictionary. 


Figure  1.  Two  dimensional  presentation  of  ranked  order  output. 
Figure-*  2.  Normalized  ranked-order  output  as  presented  in  figure  1 
Figure  3.  Relevance-recall  curves  derived  from  formula  (1). 


RECALL/0  IN  % 


Figure  1.  Recall -relevance  curves  us  functions  of  OK  and  ttK, 
('•  is  i ho  relevance  of  the  entire  collection  with 
respect  to  a  particular  query  and  "  is  the  ratio 
of  the  documents  withdrawn) 


Integral  distribution  of  the  Dj's  obtained  in  3  test  runs. 

(a  Gaussian  distribution  would  appear  as  a  straight  line 
in  this  presentation) 


NUMBER  rt  OF  RELEVANT  DOCUMENTS 

1 

Figure  H.  U  's  of  !  i-e  50  queries 
respo.  sive  doc u me  1  its 


FREQUENCY 


I 


EVALUATIONS 


DEFICIENCY  D 


AVERAGE  NUMBER  OF  APPLIED  CATEGORIES 


FREQUENCY  f 


The  erriM.-i  iveness  «*r  tin*  individual  categories  vs 
•  heir  fremiencv  within  the  document  vectors. 
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{Wc)averoge 


•  II 


•  28 
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•  17 
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II-  SUPPLEMENT  ART  NOTES 


12.  SPONSORING  MILITARY  ACTIVITY 


isTarstract 

To  advance  the  ABC  system  toward  the  automation  of  its  retrieval  and  analytical 
input  operations,  linguistic  problems  were  studied,  and  a  prototype  computerized 
retrieval  test  was  conducted.  A  vector-type  organization  was  imposed  on  the  test 

collection. 

An  appropriate  measuring  tool  was  constructed  and  used  (a)  to  evaluate  a  variety 
of  system  parameters  (ca  50  test  runs  were  required)  and  (b)  to  rate  diffeient  systems 
that  evolve  f\roin  the  basic  ABC  model. 

The  process  of  computerizing  the  standardization  of  ABC  descriptors  as  well  as  the 
oroduction  of  a  comprehensive  thesaurus  (presenting  terminology  with  associations  and 
functions)  are  described  and  so  are  the  methods  prepared  for  progressive  automation 
of  the  analytical  effort  in  future  test. models. 
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